The genomic epidemiology of shigellosis in South Africa

Shigellosis, a leading cause of diarrhoeal mortality and morbidity globally, predominantly affects children under five years of age living in low- and middle-income countries. While whole genome sequence analysis (WGSA) has been effectively used to further our understanding of shigellosis epidemiology, antimicrobial resistance, and transmission, it has been under-utilised in sub-Saharan Africa. In this study, we applied WGSA to large sub-sample of surveillance isolates from South Africa, collected from 2011 to 2015, focussing on Shigella flexneri 2a and Shigella sonnei. We find each serotype is epidemiologically distinct. The four identified S. flexneri 2a clusters having distinct geographical distributions, and antimicrobial resistance (AMR) and virulence profiles, while the four sub-Clades of S. sonnei varied in virulence plasmid retention. Our results support serotype specific lifestyles as a driver for epidemiological differences, show AMR is not required for epidemiological success in S. flexneri, and that the HIV epidemic may have promoted Shigella population expansion.


Reporting Summary
Nature Portfolio wishes to to improve the reproducibility of of the work that we we publish.This form provides structure for consistency and transparency in in reporting.For further information on on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist Statistics For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of of all covariates tested
A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g.means) or or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g.Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in published literature, software must be be made available to to editors and reviewers.We We strongly encourage code deposition in in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.

nature portfolio | reporting summary
April 2023

Data Policy information about availability of data
All manuscripts must include a data availability statement.This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy Research involving human participants, their data, or biological material Policy information about studies with human participants or human data.See also policy information about sex, gender (identity/presentation), and sexual orientation and race, ethnicity and racism.
Reporting on sex and gender Reporting on race, ethnicity, or other socially relevant groupings

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Data exclusions
Replication Randomization Blinding All supporting data have been provided within the article or through supplementary data files.The study isolates' raw sequences have been deposited in the European Nucleotide Archive under project accession PRJEB55173 and individual isolate accession numbers can be found in the Supplementary Data.Accession numbers for reference isolates used in the study are also provided in the supplementary table.
All references to patient hospital recorded sex / gender have been reported as sex These are not discussed in our study.
Patients were sampled from according to a standard case of moderate to severe diarrhoea: presenting at a public healthcare facility with a history of diarrhoea defined as three or more loose stools per day, either with or without blood.Patients of all ages and sexes were were included in the study.
Patients were recruited as part of public healthcare surveillance in South Africa.Coverage and access to healthcare is greater in urban areas compared to rural areas, thus the findings of our study may be less relevant in rural areas than urban areas.Sampling was also conducted at public hospitals, which likely have a different patient demographic than private healthcare facilities in the region.Thus it is possible that our study was unable to detect shigellosis associated with certain demographics, such as the community of men who have sex with men, we found no phylogenetic signal suggestive of transmission within this community but this is unlikely to mean transmission is not occurring.Similarly, requiring patients to attend hospital means that less symptomatic patients will not be sampled from.All together, through these sampling methods we will have under-sampled from some human sub-populations and potential some Shigella sub-populations, and through this perhaps have missed important epidemiological factors affecting shigellosis in South Africa No individual consent was required or sought as all isolates were collected as part of routine surveillance and ethical approval for the use of patient data for public health activities was granted by the Human Research Ethics Committee of the University of the Witwatersrand (Protocol Numbers: M060449 and M110499).
No sample size calculation was performed as this was a descriptive pilot study.However, sample sizes were comparable to similar initial national studies describing pathogen genomic epidemiology.
Data excluded from some statistical analyses where necessary metadata unknown.Additionally, four isolates were excluded from the BEAST2 analyses due to being outliers in molecular clock signal, visually assessed in TempEst(v.1.5.3)Used publicly available software and standard statistical methods for analyses, recording all non-default parameters/settings.
Sub-sampling of isolates from the larger surveillance dataset was randomised.Random sub-sampling was achieved by selecting every 8th (S. flexneri 2a) or 5th (S. sonnei) isolate, based on SA lab number, from the database of collected, surveillance isolates (collected from any body site, 1 January 2011 -31 December 2015, for which demographic data was available).Associated patient metadata was used in further analyses and was also, therefore, randomized through initial random selection.
Blinding is not relevant for our study as no interventions were given.

nature portfolio | reporting summary
April 2023 Reporting for specific materials, systems and methods We We require information from authors about some types of of materials, experimental systems and methods used in in many studies.Here, indicate whether each material, system or or method listed is is relevant to to your study.If If you are not sure if if a list item applies to to your research, read the appropriate section before selecting a response.Describe the methods by which all novel plant genotypes were produced.This includes those generated by transgenic approaches, gene editing, chemical/radiation-based mutagenesis and hybridization.For transgenic lines, describe the transformation method, the number of independent lines analyzed and the generation upon which experiments were performed.For gene-edited lines, describe the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor was applied.was applied.was applied.
Report on the source of all seed stocks or other plant material used.If applicable, state the seed stock centre and catalogue number.If plant specimens were collected from the field, describe the collection location, date and sampling procedures.
Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to assess the effect of a mutation and, where applicable, how potential secondary effects (e.g.second site T-DNA insertions, mosiacism, off-target gene editing) were examined.